Scoring: Investigative vs. Evidentiary (Evolved from CQ for Sex... Post)

Thanks for stopping by our bulletin board.
Please take just a moment to register so you can post your own questions
and reply to topics. It is free and takes only a minute to register. Just click on the register link

Polygraph Place Bulletin Board

Professional Issues - Private Forum for Examiners ONLY

profile | register | preferences | faq | search

next newest topic | next oldest topic

Author	Topic: Scoring: Investigative vs. Evidentiary (Evolved from CQ for Sex... Post)
Barry C Member	posted 08-24-2005 12:37 PM J.B., What I meant was some people are clearly better off shifting their scoring criteria from "Investigative" to "Evidentiary" or whatever system necessary to "balance" the scorer's decisions. By balance, I mean ID both DI and NDI at similar rates. Many examiners are good at IDing the DI, but lousy at IDing the NDI. If you look at the data in Krapohl's article on decision rules for paired testing you'll see that 100% could ID the DI at well above chance rates (using traditional scoring rules). However, when you look at how well those same examiners ID'd the NDI, 40% were at or below chance levels. The data was clearly there to make correct decisions as two examiners (all looking at the same data) correctly ID'd the NDI at a rate of 84%. Compare that to the one examiner that only ID'd 41.7%, and you'll have to agree, the test is not really the problem, the scorer is. When the same scorer's scores where adjusted for the then proposed "Evidentiary Decision Rules," the group averages were 80.8% and 80.3%, IDing the DI and NDI, respectively. Both were up from 78.3% and 61.5%. (Also, it's important to note the INCs went from 22.1 to 9.6%.) When applying the "Evidentiary Scoring Rules," examiners decisions (as a group anyhow) were more balanced, and that is what I meant by a need - for some - to change their scoring rules. As you correctly point out though, an examiner really needs to have his scoring ability tested as required under the Marin Protocol. When you look at the data you notice some people's ability to correctly classify the DI decreases as the scoring rules are changed to better correctly classify the NDI, which is to be expected since perfection isn't possible. It seems to me and examiner would want to know where his decisions are "balanced" so he knows how much confidence to put into any decision. Poly761, Some scoring systems are prejudicial: If I score to the strongest CQ and a Backster examiner scores to the weakest, his scores must go in the more negative direction, a bias against the NDI. (That doesn't necessarily mean his final call will be wrong, but it's probably going to be a lot closer to wrong or INC than the former method in a close case. Backster also adjusts his cut-off scores to try to remedy the problem. It's interesting to note, though, the Utah studies use +/-6, and they had an even distribution of DI and NDI.) It's also pretty clear from the data I discussed above, some examiners - for whatever reason - are biased against the truthful. (It may just be that the innocent react less to the CQs than the liars react to the RQs.) You apparently haven't heard of the Marin Protocol or Paired Testing, which is the cause of some confusion. Go to the following website for info, which should help you understand where we're coming from adn see why we differentiate between "investigative" and "evidentiary" tests. IP: Logged
J.B. McCloughan Member	posted 08-24-2005 02:52 PM Barry, Thank you for the explanation. I assumed you were speaking of evidentiary and investigative but was not completely sure. I agree with you on the scorer issue, as earlier stated. By more, I was referring the examiner’s ability to conduct a proper exam, another part of the Marin Protocol. The Marin Protocol is one of the procedures used for evidentiary polygraphs. The APA also has protocols for how to conduct such an exam. I believe it has been suggested by Krapohl that the scoring system used for the Marin Protocol not be used for investigative exams, due to the aforementioned problems in my post about choosing errors. I agree with Poly761 that no errors are acceptable but I doubt that they may be eliminated altogether. In a presentation at the APA this year with Krapohl, there were two studies of blind scorers using the investigative and the evidentiary scoring criteria, Marin Protocol on these studies. From what I can surmise from the data and the presentation, both studies showed a slight (less than 5%) increase in accuracy for DI, a significant ( >or=20%) increase in accuracy for NDI and significant reduction of approximately %20 of the inconclusives for NDI as well. There was however an increase in errors for the DI, as much as 5%. The ‘General Findings’ stated: quote: Overall accuracies between Investigative and Evidentiary Decision rules were not significantly different. Accuracy for deceptive cases were not significantly different. EDRs (Evidentiary Decision Rules) produced balanced accuracy, lower inconclusive rates, and improved detection of truthful. IDRs (Investigative Decision Rules) did significantly better with deceptive cases than truthful cases. ‘Cautions’ advised: quote: Only the single-issue three-question ZCT has been tested and replicated. EDRs may not be appropriate for investigative settings. Though no one’s overall accuracy dropped with EDRs, a small percentage did not improve. Either way, I think that we all agree that one should not be conducting examinations without formal training on a given test and the scoring method(s) that are approved for use with it. It is not a ‘Best Practice’ to marry two of the aforementioned before they have at least first been tried and tested in the analog world. [This message has been edited by J.B. McCloughan (edited 08-24-2005).] IP: Logged
Barry C Member	posted 08-24-2005 03:25 PM Yes, that's right. If you design your scoring criteria to catch more liars, you'll miss more truthful people. If you design it to catch more truthful, you'll miss more liars. So, you need to determine what your goals are. If you are simply trying to discover who is involved in a particular crime, then you will want to adjust your cut-offs to catch the greatest number of them possible, which means you'll mislabel more truthful people. So far, there has been one study, and that is what Don was talking about. The second (replication study) is underway now, but he might have mentioned it as the findings are similar to the one he spoke on. Many examiners / departments choose their own cut-offs as the numbers are, for the most part, pretty arbitrary. They choose the cut-offs depending upon how much error they are willing to tolerate - and in which direction. Personally, I think IDing people at chance rates or less is unacceptable, but an examiner needs to be tested to know where he stands. I believe I should be IDing the innocent as well as IDing the guilty. Don's point was if you falsly label a truthful person as DI, there's little cost as he's no worse off than he would have been before - he's still a suspect. (So, if you call everybody DI, you lose nothing, from an investigative standpoint.) My Virginia School of Polygraph comrads use what they call a "Modified Backster" ZCT. It's the same format as the DoDPI ZCT, but they use +5 / -7 cut-offs with no spot score rules. Many of us do the same test (calling it a DoDPI test) with the traditional +/-6 and -3 spot score rules. Evidentiary rules use a +4 / -6 regardless of spot scores. If INC, then a -3 or less in any spot results in a DI call. The two stage approach Stu Senter wrote about back a ways in the APA Journal, and I think he wrote the article, perhaps with Krapohl, on accuracy with different cut-offs. Both very interesting articles. IP: Logged
J.B. McCloughan Member	posted 08-24-2005 11:36 PM Barry, I am not sure whether the replication study is completed or ongoing but there were results reported at the APA, with ten participants from the Marin Protocol. I am not sure how many examiners or departments choose their own 'cut-offs'. You say at 'chance' rate and I assume you are speaking of a study. If this is the case, I am wondering which study and the chance rate? I am glad to see that you are for the Marin Protocol. Have you already done or thought of doing the certification process? [This message has been edited by J.B. McCloughan (edited 08-24-2005).] IP: Logged
Barry C Member	posted 08-25-2005 08:26 AM J.B, I have done the scoring for the Marin Protocol. I have to send out a tape for eval, which is sitting on my desk, so I should have been done by now, but I'm not. The second study is underway. Don asked if I'd be willing to work on it with him, which I agreed to do. We haven't sat down and compared notes, so I've been pretty guarded with the data. The first study he's circulated, and I used those numbers in my posts. (The data for the others has been around for a while, and Don has mentioned it appears to replicate the earlier data, which it does.) When I say "chance," I mean examiners were given X amount of exams to score in which ground truth was known to be NDI, and they ID'd about half of them correctly, which is what you'd get if you just flipped coins instead of using polygraph. Our overall numbers look pretty good because we ID the DI at very high rates - some at 98%, which tells us our scoring rules are unbalanced, but that makes sense as (NDI) people only respond to CQs at about 75% of the intensity of (DI) RQs. (I've yet to figure out why the Utah studies worked out at +/-6 catching the DI and NDI equally, but they did.) But to make my point more clearly, one examiner ID'd 98% of the DI (2% INC), but only 42% of the NDI (and 40% INC) with his overall accuracy of 88% (without INCs). That looks pretty good until you realize you don't want to go to him if your telling the truth! With Marin rules, he ID'd the NDI at 73%. Note these are not correct decisions - only what percentage he actually ID'd. Most of the misses were because he called them INC. My point is the same though: many examiners fail to ID the NDI with traditional scoring rules. I hope that makes my points more clear. IP: Logged
Poly761 Member	posted 08-25-2005 09:18 AM After reviewing (some) data about the Marin Protocol and Paired Testing I return to an old issue. An attorney's client in a criminal case goes to a private examiner before a police examination. Results for both should be the same, good or bad. If the results are not in the best interest of the client (or) attorney, the client will probably never see the inside of a police polygraph office. I forsee the same happening with paired-testing. If I'm an involved attorney I want to know what the results will be (in advance). In a major civil ($) issue I don't expect any attorney will permit the possible outcome of their case based on a polygraph examination. Once the attorney knows there is a "problem" with their client they will likely refuse a paired-testing option and change direction of their case. While I can see the value of paired-testing I don't see this concept getting off the ground any time soon. Paired Testing can't be made mandatory and I believe necessary legislative changes won't get far. Typical of "stipulated" cases in a criminal issue, someone is going to get the short end of the stick once results are known - end of "stipulation." One of the main objections I have to paired-testing as I understand this concept is an examiner (can't) defend their work if their opinion is "over-ridden." Again, I don't know of any attorney that would accept this (negative) opinion against their client and agree it can't be challenged. I am in total agreement with a certification process. Many of us are now "Certified" after meeting additional requirements graduating from our polygraph school. What Marin is proposing requires higher standards that attempt to ensure demonstrated accuracy of all of our work. Once we, those in the polygraph profession, enact requirements as those proposed by Marin we (might) start opening the door to court rooms and acceptance by the legal community. Think back to the 80's when there were many examiners who opposed licensing. Hatch and Kennedy legislated what we (examiners) would not do ourselves, "police" our own profession. I suspect the type of certification proposed by Marin may only be sought by those who want to get involved in the paired-testing system. END..... IP: Logged
Barry C Member	posted 08-25-2005 10:12 AM Poly761, The Marin protocol is a voluntary process designed to keep liars out of court. ASTM has the entire Paired-Testing process spelled out. It's lengthy, but it had to be. According to Marin, there are a few bar associations waiting for certified examiners so they can jump on board. The whole point of paired testing (and some are doing it already) is figuring out who's lying when you have two stories (he said, she said) in which at least one must be lying. The suggestion in your case is this: the party who takes the test and passes is allowed to testify on the subject; the party who refuses the test can't testify about that topic. If they both test and one passes and one fails, then the failing party can't testify to the disputed info. The results could still be challenged, but that's where the math comes in. If two examiners have demonstrated 86% accuracy, then the chances of them getting down to who's lying and being wrong is only about two percent - well better than chance, and even beyond a reasonable doubt. (You also avoid the base rate debate because if one ssid the sky was skyblue-pink and the other said it wasn't, then it's pretty clear.) I think you will see civil courts at least come on board rather quickly as the standard of proof is only a preponderance of the evidence, which Paired-Testing meets easily. If it's going to cost an attorney money to test drive his client only to say no to a stipulation, which could result in a judge (and there are some willing to do so) saying the guy can't testify on the subject (he could other things, but flush the case good-bye), then Paired-Testing achieved it's goal anyhow. Score one for truth! (Just because it was a secret test for the attorney, the guy still failed and can't (or won't) make it through the paired-testing protocol.) You say that the results should be the same, good or bad, but the data doesn't bear that out. If the guy's truthful, many examiners would call him INC or DI, so there would be a benefit in using a (paired-testing) certified examiner. Keep in mind Marin (and ASTM) chose 86% because that's what the NAS study found was the average accuracy rate. If those numbers accurately represent the examiners in this country - and they seem to from the many studies I've read - then half the examiners in the US don't achieve that level. Even if the protocol never takes off - though I believe it will - I hope it is a catalyst for improving examiners' techniques, scoring, etc. Anything that moves us ahead, I'm for. For those examiners who don't score well enough for certification, the fix is pretty simple: teach them how to score charts at an 86% rate or better. Many examiners are afraid of twos and threes, but that could be the difference between a decision and an INC. If they're not running tests correctly, the fix is pretty easy for that too (in most cases): use a checklist of proper procedures. (Honts and Raskin in there chapter in Kleiner spell out exactly how to run a scientifically sound exam.) I think I'm rambling now, sorry. IP: Logged
J.B. McCloughan Member	posted 08-25-2005 11:18 AM Barry, I am at the same stage with the Marin Protocol certification and too need to send in a recording. I guess I would need to know the base rate for each group, DI and NDI, but here are the numbers assuming 100 cases with equal distribution of 50 per group (DI/NDI). If you exclude the inconclusives, the individual you reported would have correctly identified 48 deceptive and 13 truthful individuals. He or she would have incorrectly identified 1 deceptive as truthful and 17 truthful as deceptive. There would have been 1 truly deceptive person and 20 truly truthful persons found inconclusive. If those are the numbers and my math is correct, I would get mostly the same results as you did: True Positives = 48, False Positives = 17, True Negatives = 13, False Negatives = 1 Sensitivity = .9796 Specificity = .4333 False Positive Probability = .5667 Positive Predictive Value = .7385 Negative Predictive Value = .9286 In this case and after dropping inconclusives, we are left with an N of 79. Of those 79, 61 are correct and 18 are incorrect decisions. So the basic overall correct decisions would be .7722 or 77%. However, we have a change in base rate distribution now, as we only have .3797 or 38% NDI and .6203 or 62% DI calls that are possible instead of 50/50. If one were to simply opine all of the remaining exams as deceptive, He or she would be correct in their decision 62% of the time. Is this what you have? IP: Logged
Barry C Member	posted 08-25-2005 12:28 PM You're gonna make me work at this aren't you? I didn't crunch the data for that study, but I have the results. I treated it as 50/50 for the sake of making my point, but it's clear from the numbers the particular examiner in question was not in a 50/50 group. The bottom line is he only identified the truthful 41.7% of the time, but the DI, 98% of the time. (The scorer in my example actually called 98% of the DIs, DI, with 2% INC and 0% errors. He (or she) called 41.7% of the NDIs, NDI, 18.8% wrong (DI) and 39.6% INC.) So, if you plug those in as 50/50 (which is what we'd expect of the examiner in a 50/50 situation), he'd ID 21 (ish) of 50 NDIs as NDI. He'd fail to make a decision on 20 (INCs, but not errors, I agree), and he'd mislabel 9 NDIs as DI (false positives). The actual data is based on two scorers scoring 50/50 (DI/NDI) for that study; three scorers scoring 61/39; (old data from another study)three scorers scoring 50/50; and another examiner from on of Capps' and Ansley's projects scoring 52/48. The average for all 10 was as follows: DI: 78.3% correctly called 6.8% errors 14.9% inconclusive NDI: 61.5% correctly called 9.3% errors 29.2% inconclusive My point is the data is there pull those NDI INCs into decision range. You can either score more liberally on positive scores, or you can adjust your scoring rules (which benefited nost examiners). Using Evidentiary Scoring Rules the same data yielded the following results: DI: 80.8% correctly called (an increase) 10.4% errors (an increase) 8.9% inconclusive (a decrease) NDI: 80.3% correctly called (substantial increase) 9.3% errors (no change) 10.5% inconclusive (a substantial decrease) As you can see, the "cost" appears to be a slight increase in false negatives, though it did not reach statistical significance, but it was close. The benefit is our examiner (above) now would have changed nothing in the DI arena, but he would have correctly ID'd 36 or 37 (72.9%)of the NDIs, missed 6 (12.5%), and not called 7 (14.6%). Which is more acceptable? I think it does depend on your goal (to some extent anyhow): investigative or evidentiary. In this examiners case (which will vary from examiner to examiner), he only wins. Your thoughts? IP: Logged
Barry C Member	posted 08-26-2005 07:50 AM Ebvan, You asked / stated the following: Barry could you explain how a symptomatic effects score. Also where can I find an authoritative article establishing symptomatics don't work or lower the scores of the truthful? Can you agree that a strong reaction to a symptomatic is indicitative of an outside issue problem even though lack of reaction does not mean that there isn't an outside issue problem? Honts and Amato did a study and found that examiners could not ID outside issues at greater than chance rates, and he found tests with OI questions lowered the scores of the truthful, so no I don't believe a strong reaction to an OI question is at all meaningful. (They did find they make good CQTs, but not as good as PLCQTs.) (There have been other studies as well. Capps did one some time ago in which he said they worked, but Krapohl crunched the numbers and found Capps' conclusions to be invalid, so there are no studies supporting they do what they are supposed to. Many ask if there would even be an outside issue if the examiner didn't introduce the possibility in the first place.) If many examiners strictly adhere to the "rules" governing a test, then many wouldn't be eligible for Paired-Testing certification as they do too poorly with those rules. The scientific evidence is in: Any test can be scored with any validated scoring system, which means DoDPI or Utah (or perhaps OSS and the Kircher criteria). The cut-offs are supposed to be based on probabilities, which means you need to determine where your acceptable rate of error is. DoDPI and Utah each chose +/6. With Utah, they catch the true NDI and DI at the same rates. With the DoDPI criteria, the cut-offs are not symmetrical. If you want them to be more symmetrical, use the Marin or Evidentiary cut-offs, but I suggest being tested first so that you're not guessing as most would be. Is a DoDPI ZCT a "Billy Bob" test is a person scores it with OSS? Of course not as OSS was designed (as was APL et al) to score those tests, and the research shows it's better than examiners hand scores. Now, what cut-offs would you use? OSS scores are based on probabilities, and it's up to the examiner to choose error rates and determine what the cut-off scores are. +/-8 will get you to 95%, but you could use +/- anything you choose if you don't like that rate. Do you really think you catch people with a +/-6 at the same rate as every other examiner? Again, of course not, which means there is no magic about those figures. Just look at the data above. At our last association training meeting I put up charts in which ground truth was known and I asked people to score them. We had something along the lines of a -2 and +12 on one test (by different examiners, obviously), and we had one in which some called one DI, NDI and INC - and we all had the same data in front of us. Don has done the same thing at APA and AAPP seminars. The Honts / Amato study (done for DoDPI) is called, "Validity of Outside-Issue Questions in the Control Question Test." I have a copy if you can't find it, but here's a link to the abstract if that's all you care to read: http://www.stormingmedia.us/66/6666/A666673.html IP: Logged
ebvan Member	posted 08-26-2005 03:56 PM Barry, If you are willing and able, I would like a copy of the study. After reading the abstract my curiosity is somewhat aroused. Based on the abstract it seems that outside issues may not be relied upon to always manifest themselves from the asking of Symptomatic or Outside Issue Question. Nothing in the abstract indicates that reaction to SQs shold be ignored. Email me directly at ebvan@swbell.net if you need a mailing address etc. Since I haven't looked at the study yet, for the purpose of this question, I am willing to accept that outside issues may be difficult or impossible to detect using a symptomatic question. What would you do if you were giving a DODPI or Backster Zone including SQ's and you observed a consistent, timely, and significant reaction to either or both SQ's and your comparison scoring of RQ vs.CQ resulted in an NDI or INC call, would you simply ignore the SQ reaction or would you inquire further? Please understand I am not trying to be argumentative, I am merely trying to educate myself on a limited budget. Thanks for your help I always find your posts informative. In the words of Cletus the slack-jawed yokel. "You agitate my brainy bone" and make me think. ebvan IP: Logged
Barry C Member	posted 08-26-2005 06:19 PM I don't use them (and I'm not alone), so this is hypothetical. If the test was NDI, I'd accept call it that way. If INC I wouldn't focus on any OI as I would want to retest him. I'd ask what he'd been thinking about in general, but I wouldn't make myself a bigger problem by focusing in on it. I'd then (later, most likely) re-test and make sure he understands the RQs, and I sell the CQs like my life depended on it. If that didn't work I'd send him to somebody else. Think about the whole point of the test though. The idea is to see where the examinee's focus goes, CQs or RQs. Why introduce a third area of concern that doesn't have to be there? You're now adding another issue to the test (that's why Backster calls it the black "zone." I want two: red and green.) (You don't use them in the Air Force, Army, Navy, or SS MGQTs. Why not?) I understand Backster's original concern, and I appreciate the attempted fix, but science hasn't supported his theory. You're right, nothing in the study says the OI questions should be ignored. They can be scored as CQs (though they're weaker than PLCQTs). There's also nothing in the study that supports a strong response to an OIQ has any correlation to an OI, so you can't conclude that with any certainty. I'm home, and the article is on my work computer, so I'll email it Monday. Somebody here probably has it, so perhaps you'll get it sooner. If so, let me know. I've also got the APA articles studies (somewhere) in PDF format, which end with Don showing Capps was wrong in his conclusion that OIQs are effective. IP: Logged
J.B. McCloughan Member	posted 08-26-2005 06:21 PM ebvan, If I, quote: ...observed a consistent, timely, and significant reaction to either or both SQ's My first thought would be that the subject is attempting countermeasures. I personally still use the Symptomatic questions for that very purpose. It introduces two more questions that the subject, who may be planning on attempting countermeasures, has to identify. [This message has been edited by J.B. McCloughan (edited 08-26-2005).] IP: Logged
ebvan Member	posted 08-27-2005 06:29 AM Thanks I'll reserve further questions and comments until I have a chance to read the study. IP: Logged
ebvan Member	posted 08-29-2005 04:01 PM Barry, Thanks for the information. I find the numbers in the research very convincing. Plus, there was an obvious flaw or two in the Capps study. Using a guilty plea for establishing ground truth is one of the more obvious. Failure to consider false positives and false negatives more thoroughly was another. (of course I was just reading an abstract.) As I understand the gist of the research: #1 Symptomatic questions do not reliably detect outside issues. #2 Use of Symptomatic questions may increase the possiblity of a False Positive. #3 Symptomatic questions may function as diluted PLCs for comparison purposes. I agree that numbers 2 and 3 require further research. The questions that still come to mind is that by accepting that SQ's have some value for comparison don't you also have to accept that a reaction to a SQ is indicative of a probable lie? Therefore logically it should also be evidence of the existence of an outside issue. If it is not evidence of an outside issue, shouldn't it be considered an orienting response to the nature of the question and unusable for comparison? I am thoroughly enjoying this discussion. Please excuse the spelling and grammar errors. It's Quittin Time. ------------------ but then, that's just one man's opinion IP: Logged
Barry C Member	posted 08-29-2005 06:44 PM It could be the guy is asking himself why you are asking that question, so it's not a probable lie, just a little confusion. Remember, it didn't matter if there was or wasn't an outside issue for people to respond, so how would those without outside issues lying? (Not that we could ever know for sure.) Is the truthful person COMPLETELY convinced you will not ask an unreviewed question once you bring up the possibility? It's like asking if the person is sure he told you the complete truth. Once you do that, he's not sure. Not a lie, but not sure, which is the other way a PLCQ works. You wouldn't use them for CQs, it was just an observation they made. Note the OIQs had no effect on the liars, just the truthful, which is not what the OI proponents would expect. IP: Logged

All times are PT (US)	next newest topic \| next oldest topic
Administrative Options: Close Topic \| Archive/Move \| Delete Topic
	Hop to:

Contact Us | The Polygraph Place